Confidence Intervals

Terms

Point Estimate is a single value calculated from a sample of data that is used to estimate the value of an unknown parameter of a population.


Confidence Interval is a range of values that likely contains the true population parameter with a certain level of confidence.


Level of Confidence is a probability that the confidence interval actually captures a true value.

Before we discuss confidence intervals, we need to learn about bootstrapping 🥾

Bootstrapping

This is a technique used to estimate the sampling distribution of a statistic by repeatedly re-sampling from the observed data with replacement

Bootstrap Example

This is our sample.

\[(-5,0)*--*-*-*-*---(0,0)--\stackrel{\mu}{|}--*--*--*-*-- (5,0)\]

Then randomly pull 3 values, which becomes our data.

\[(-5,0)---*-----(0,0) \stackrel{\mu}{|}----*-----*-- (5,0)\]

Plot our mean then throw those values back in (i.e., replacement) and pull another 3.

\[(-5,0)--------(0,0)----*--*-\stackrel{\mu}{|}-*---- (5,0)\]

Plot that mean and repeat!

Bootstrap Example

Eventually we will have tons of means (we only have 9 in our example here) but in reality we would have hundreds or even thousands huddled around a similar point on our number line.


\[(-5,0)-\stackrel{\mu}{|}----\stackrel{\mu}{|}---(0,0)---\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}---\stackrel{\mu}{|}- (5,0)\]


Once we have these, we have robust estimates of a true mean within a sample.

Great, but what does this have to do with confidence intervals?

Confidence Intervals

Well, when we bootstrapped we essentially created an empirical sampling distribution.


\[(-5,0)-\stackrel{\mu}{|}----\stackrel{\mu}{|}---(0,0)---\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}---\stackrel{\mu}{|}- (5,0)\]


And as a result we can identify the percentiles that corresponds to your desired confidence level.


These percentiles from the bootstrap replicates become the lower and upper bounds of what we will call a confidence interval.

Confidence Intervals

Now that we have this sampling distribution we can calculate our 95% confidence interval.


Which simply means, we have a 95% certainty that our population mean will fall between the two red lines.


And 5% of the time, our population mean could fall outside of that zone.


\[(-5,0)-\stackrel{\mu}{|}--\stackrel{\mu}{|}---(0,0)---\color{red}\vert-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\color{red}\vert--\stackrel{\mu}{|}- (5,0)\]

Time for Math! 🧮

Confidence Interval

\[\text{CI} = \bar{x} \pm z \frac{s}{\sqrt{n}}\]


\(\bar{x}\) = sample mean

\(z\) = confidence level

\(s\) = sample standard deviation

\(n\) = sample size

  • Large sample are \(\geq 30\)
  • Small samples are \(<30\)

Confidence Levels



Confidence Level Example

A researcher wants to estimate the average number of hours per week spent by employees in a certain company on remote work activities.

They take a random sample of 80 employees and find that the average number of hours spent on remote work per week in the sample is 12 hours, with a standard deviation of 2 hours.

Calculate the 95% confidence interval for the average number of hours spent on remote work activities per week by all employees in the company.

Confidence Interval Example

1) Find the z score

2) Add known values to equation

3) Calculate the SE or \[\frac{s}{\sqrt{n}}\]

4) Multiple value from step 3 by our z value

5) Subtract that from our \(\bar{x}\) for our Lower Bound CI

6) Add that from our \(\bar{x}\) for our Upper Bound CI

\[\text{CI} = \bar{x} \pm z \frac{s}{\sqrt{n}}\]


  • 95% CI: 11.5613 \(\leq\) \(\bar{x}\) \(\leq\) 12.4387

Confidence Interval for Small Samples

With smaller samples (n< 30) the Central Limit Theorem does not apply,so we use a different distribution called the t distribution.


In order to locate our t-value, we have to locate the degrees of freedom.

Degrees of Freedom

The number of independent values that are simple allowed to vary within the calculation of a parameter.


\[df = n - 1\]

n = sample size


But why n-1?

n-1 with respect to Degrees of Freedom

Let’s consider an example where 4 people are asked to pick a number that, when combined with the other members, will equal 50.

  • Person 1 chooses 10

  • Person 2 chooses 20

  • Person 3 chooses 10

  • Person 4 chooses…… Oof they can’t choose anything! They are no longer free to vary.

  • They have to choose 10 because the we want a sum that equals 50.

Confidence Interval for Small Samples

Once we have a df and our confidence level, we head to page 373 in our book or head to this link.

Confidence Interval for Small Samples Example

A biology class wants to estimate the average length of a certain species of fish in a lake.

They take a random sample of 15 fish and measure their lengths. The sample mean length is 30 cm, and the sample standard deviation is 4 cm.

Calculate the 95% confidence interval for the average length of this species of fish in the lake.

Confidence Interval Example

1) Find the df and our confidence interval and locate the t value

2) Add known values to equation

3) Calculate the SE or \[\frac{s}{\sqrt{n}}\]

4) Multiple value from step 3 by our t value

5) Subtract that from our \(\bar{x}\) for our Lower Bound CI

6) Add that from our \(\bar{x}\) for our Upper Bound CI

\[\text{CI} = \bar{x} \pm t \frac{s}{\sqrt{n}}\]

  • 95% CI: 27.781 \(\leq\) \(\bar{x}\) \(\leq\) 32.219

That’s all folks! Have a great weekend!